An important class of techniques for resonant anomaly detection in high energy physics builds models that can distinguish between reference and target datasets, where only the latter has appreciable signal. Such techniques, including Classification Without Labels (CWoLa) and Simulation Assisted Likelihood-free Anomaly Detection (SALAD) rely on a single reference dataset. They cannot take advantage of commonly-available multiple datasets and thus cannot fully exploit available information. In this work, we propose generalizations of CWoLa and SALAD for settings where multiple reference datasets are available, building on weak supervision techniques. We demonstrate improved performance in a number of settings with realistic and synthetic data. As an added benefit, our generalizations enable us to provide finite-sample guarantees, improving on existing asymptotic analyses.
translated by 谷歌翻译
In collider-based particle and nuclear physics experiments, data are produced at such extreme rates that only a subset can be recorded for later analysis. Typically, algorithms select individual collision events for preservation and store the complete experimental response. A relatively new alternative strategy is to additionally save a partial record for a larger subset of events, allowing for later specific analysis of a larger fraction of events. We propose a strategy that bridges these paradigms by compressing entire events for generic offline analysis but at a lower fidelity. An optimal-transport-based $\beta$ Variational Autoencoder (VAE) is used to automate the compression and the hyperparameter $\beta$ controls the compression fidelity. We introduce a new approach for multi-objective learning functions by simultaneously learning a VAE appropriate for all values of $\beta$ through parameterization. We present an example use case, a di-muon resonance search at the Large Hadron Collider (LHC), where we show that simulated data compressed by our $\beta$-VAE has enough fidelity to distinguish distinct signal morphologies.
translated by 谷歌翻译
基于分数的生成模型是一类新的生成算法,即使在高维空间中也可以产生逼真的图像,目前超过其他基准类别和应用程序的其他最新模型。在这项工作中,我们介绍了Caloscore,这是一种基于分数的生成模型,用于对量热计淋浴的应用。使用快速热量量表模拟挑战2022数据集研究了三个不同的扩散模型。Caloscore是基于分数的生成模型在对撞机物理学中的第一个应用,并且能够为所有数据集生成高保真量热计图像,为热量计淋浴模拟提供了替代范式。
translated by 谷歌翻译
机器学习提供了一个令人兴奋的机会,可以改善高能物理探测器中几乎所有重建对象的校准。但是,机器学习方法通常取决于训练过程中使用的示例的光谱,这是一个称为先前依赖性的问题。这是校准的不良属性,需要适用于各种环境。本文的目的是明确强调某些基于机器学习的校准策略的先前依赖性。我们展示了基于仿真和基于数据的校准的最新建议如何继承用于培训的样本的属性,这可能会导致下游分析的偏见。在基于仿真的校准的情况下,我们认为我们最近提出的高斯ANSATZ方法可以避免先前依赖性的某些陷阱,而先前独立的基于数据的基于数据仍然是一个开放的问题。
translated by 谷歌翻译
机器学习在加强和加速寻求新基本物理学方面发挥着至关重要的作用。我们审查了新物理学的机器学习方法和应用中,在地面高能量物理实验的背景下,包括大型强子撞机,罕见的事件搜索和中微生实验。虽然机器学习在这些领域拥有悠久的历史,但深入学习革命(2010年代初)就研究的范围和雄心而产生了定性转变。这些现代化的机器学习发展是本综述的重点。
translated by 谷歌翻译
我们概述了新兴机会和挑战,以提高AI对科学发现的效用。AI为行业的独特目标与AI科学的目标创造了识别模式中的识别模式与来自数据的发现模式之间的紧张。如果我们解决了与域驱动的科学模型和数据驱动的AI学习机之间的“弥补差距”相关的根本挑战,那么我们预计这些AI模型可以改变假说发电,科学发现和科学过程本身。
translated by 谷歌翻译
对异常检测方法的需求不断增长,可以以模型 - 不可知的方式扩大对新颗粒的搜索。大多数新方法的建议专注于信号灵敏度。但是,选择异常事件是不够的 - 还必须有一个策略来为所选事件提供上下文。我们提出了无监督检测的第一个完整的策略,其包括信号灵敏度和用于背景估计的数据驱动方法。我们的技术由两个同时培训的autoencoders建造,被迫彼此去相关。该方法可以脱机用于非共振异常检测,也是第一个完整的在线兼容的异常检测策略。我们表明,我们的方法在为ADC2021数据挑战准备的各种信号上实现了出色的性能。
translated by 谷歌翻译
深度生成模型正在跨科学和工业广泛用于各种目的。共同挑战是实现数据概率密度的精确隐式或明确表示。最近的建议已经建议使用分类器权重来改进深生成模型的学习密度。我们向所有类型的生成模型扩展了这个想法,并展示了通过迭代生成建模的潜在空间改进,可以避免拓扑障碍,提高精度。该方法也适用于案例是目标模型是不可差异的,并且具有许多内部潜在的内部潜在尺寸,必须在细化之前被边缘化。我们在各种示例上展示了我们的潜在空间改进(激光)协议,专注于标准化流动和生成对抗网络的组合。
translated by 谷歌翻译
跟踪是大型强子撞机(LHC)的事件重建最耗时的方面之一及其高亮度升级(HL-LHC)。通过在模式识别和参数估计中包括时序,创新的探测器技术将跟踪到四维。然而,现在和未来的硬件已经具有通过现有的轨道播种算法主要未使用的附加信息。簇的形状为轨道播种提供了额外的尺寸,这可以显着降低轨道发现的组合挑战。我们使用神经网络来表明群集形状可以显着降低假组合背景的速度,同时保持高效率。我们使用集群单曲,双峰和三胞胎中的信息来展示这一点。来自TrackML挑战的仿真呈现了数值结果。
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译